Neural Network Based Document Clustering Using WordNet Ontologies
نویسندگان
چکیده
Three novel text vector representation approaches for neural network based document clustering are proposed. The first is the extended significance vector model (ESVM), the second is the hypernym significance vector model (HSVM) and the last is the hybrid vector space model (HyM). ESVM extracts the relationship between words and their preferred classified labels. HSVM exploits a semantic relationship from the WordNet ontology. A more general term, the hypernym, substitutes for terms with similar concepts. This hypernym semantic relationship supplements the neural model in document clustering. HyM is a combination of a TFxIDF vector and a hypernym significance vector, which combines the advantages and reduces the disadvantages from both unsupervised and supervised vector representation approaches. According to our experiments, the self-organising map (SOM) model based on the HyM text vector representation approach is able to improve classification accuracy and to reduce the average quantization error (AQE) on 10,000 full-text articles.
منابع مشابه
Improving Distributed Representation of Word Sense via WordNet Gloss Composition and Context Clustering
In recent years, there has been an increasing interest in learning a distributed representation of word sense. Traditional context clustering based models usually require careful tuning of model parameters, and typically perform worse on infrequent word senses. This paper presents a novel approach which addresses these limitations by first initializing the word sense embeddings through learning...
متن کاملKnowledge Source Discovery: An Experience Using Ontologies, WordNet and Artificial Neural Networks
This paper describes our continuing research on ontologybased knowledge source discovery on the Semantic Web. The research documented here is focused on discovering distributed knowledge sources from a user query using an Arti cial Neural Network model. An experience using the Wordnet multilingual database for the translation of the terms extracted from the user query and for their codi cation ...
متن کاملWord-sense disambiguation in biomedical ontologies
With the ever increase in biomedical literature, text-mining has emerged as an important technology to support bio-curation and search. Word sense disambiguation (WSD), the correct identification of terms in text in the light of ambiguity, is an important problem in text-mining. Since the late 1940s many approaches based on supervised (decision trees, naive Bayes, neural networks, support vecto...
متن کاملOntology-based Distance Measure for Text Clustering
Recent work has shown that ontologies are useful to improve the performance of text clustering. In this paper, we present a new clustering scheme on the basis of ontologies-based distance measure. Before implementing clustering process, term mutual information matrix is calculated with the aid of Wordnet and some methods of learning ontologies from textual data. Combining this mutual informatio...
متن کاملAn Efficient Technique to Implement Similarity Measures in Text Document Clustering using Artificial Neural Networks Algorithm
Pattern recognition, envisaging supervised and unsupervised method, optimization, associative memory and control process are some of the diversified troubles that can be resolved by artificial neural networks. Problem identified: Of late, discovering the required information in massive quantity of data is the challenging tasks. The model of similarity evaluation is the central element in accomp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Int. J. Hybrid Intell. Syst.
دوره 1 شماره
صفحات -
تاریخ انتشار 2004